Using latent semantic indexing for morph-based spoken document retrieval
نویسندگان
چکیده
Previously, phone-based and word-based approaches have been used for spoken document retrieval. The former suffers from high error rates and the latter from limited vocabulary of the recognizer. Our method relies on unlimited vocabulary continuous speech recognizer that uses morpheme-like units discovered in an unsupervised manner. The morpheme-like units, or “morphs” for short, have been successfully used also as
منابع مشابه
Improved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features
Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval mode...
متن کاملImproved Chinese Spoken D with Hybrid Modeling and D Feature
Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval mode...
متن کاملIndexing Audio Documents by using Latent Semantic Analysis and SOM
This paper describes an important application for state-of-art automatic speech recognition , natural language processing and information retrieval systems. Methods for enhancing the indexing of spoken documents by using latent semantic analysis and self-organizing maps are presented, motivated and tested. The idea is to extract extra information from the structure of the document collection an...
متن کاملFusion of Semantic and Acoustic Approaches for Spoken Document Retrieval
Most spoken document retrieval systems use the words derived from a large vocabulary speech recognizer as the internal representation for indexing the document. However, the use of recognition transcripts inherently limits the performance of the system since the size of the dictionary restricts the number of queries for which matches can be found. In this paper we present a new approach to this...
متن کاملThematic indexing of spoken documents by using self-organizing maps
A method is presented to provide a useful searchable index for spoken audio documents. The task diiers from the traditional (text) document indexing, because large audio databases are decoded by automatic speech recognition and decoding errors occur frequently. The idea in this paper is to take advantage of the large size of the database and select the best index terms for each document with th...
متن کامل